google/gemma-3n-E4B-it-litert-preview · Image Input Modality is not Enabled!

Hi @moayrot ,

Yes, gemma-3n-E4B-it-litert-preview model is designed to support image input, and it's intended to be multimodal (text, image, and eventually audio/video). However, as a "preview" model, there can be specific nuances or limitations in its current implementation, especially when integrating with various tools like AI Studio or Google GenAI Library.

Gemma 3n models expect images to be normalized to specific resolutions (e.g., 256x256, 512x512, or 768x768) and encoded into a certain number of tokens. If the image isn't pre-processed correctly for the API/library, it might be rejected. For direct image input, ensure your images are resized to one of the supported resolutions (e.g., 256x256, 512x512, or 768x768) before passing them.

And also Review the google-genai library documentation or AI Studio examples specifically for gemma-3n and image input, as there might be specific methods or input structures required. Ensure you are using the latest versions of the google-genai library and that AI Studio is up-to-date.

Refer to the official Gemma 3n documentation on Google AI for Developers and the Hugging Face model card for gemma-3n-E4B-it-litert-preview. These sources will provide the most up-to-date and authoritative information on its current capabilities, input requirements, and any known limitations for the preview version.

Kindly try and let us know, if you have any concerns will assist you. Thank you.